Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor changes to support benchmarking #138

Merged
merged 5 commits into from
Jun 18, 2024
Merged

Minor changes to support benchmarking #138

merged 5 commits into from
Jun 18, 2024

Conversation

humpydonkey
Copy link
Collaborator

Minor changes to support benchmarking

  1. Tweak the tester prompt to return the output in a testing function
  2. Add a get_main_result() to Execution class
  3. Update langsmith to the latest version

@@ -179,6 +179,8 @@ def find_text(image_path: str, text: str) -> str:
8. DO NOT use try except block to handle the error, let the error be raised if the code is incorrect.
9. DO NOT import the testing function as it will available in the testing environment.
10. Print the output of the function that is being tested.
11. Use the output of the function that is being tested as the return value of the testing function.
12. Run the testing function in the end and don't assign a variable to its output.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to think of ways to keep the list of instructions shorter as I suspect as the list gets longer the ability to follow the directions get's poorer.

Would this also work if we removed instruction 10. (as it seems to accomplish the same thing as 11.) and combine 11. and 12. into "Return the output of the function that is being tested in the test script, do not assign it to another variable" or something shorter?

Copy link
Collaborator Author

@humpydonkey humpydonkey Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a few approaches. This actually is the best prompt so far.
Other approaches significantly increase the chance of the testing function return null.
Here are approaches i have tried:

  1. Removing 10
  2. Merge 11 + 12 into one instruction
  3. Rewrite 11 in a more concise way
  4. Move 11 + 12 to the 3th and 4th instruction
  5. Move all instructions up
    And some combinations of the above.

From my observation, this is not a problem, but shortening it can actually cause a big problem. i.e. the null values increases from 11% to 80% in my benchmark.
We probably want to rewrite/revisit this entire prompt at some point. It's a bit brittle now.

vision_agent/utils/execute.py Show resolved Hide resolved
Copy link
Member

@dillonalaird dillonalaird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AsiaCao AsiaCao merged commit 6d2223b into main Jun 18, 2024
7 checks passed
@AsiaCao AsiaCao deleted the support-benchmark branch June 18, 2024 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants